Data Reuse and Parallelism in Hardware Compilation
نویسنده
چکیده
This thesis presents a methodology to automatically determine a data memory organisation at compile time, suitable to exploit data reuse and loop-level parallelization, in order to achieve high performance and low power design for data-dominated applications. Moore’s Law has enabled more and more heterogeneous components integrated on a single chip. However, there are challenges to extract maximum performance from these hardware resources efficiently. Unlike previous approaches, which mainly focus on making efficient use of computational resources, our focus is on data memory organisation and input-output bandwidth considerations, which are the typical stumbling block of existing hardware compilation schemes. To optimize accesses to large off-chip memories, an approach is adopted and formalized to identify data reuse opportunities in local scratch-pad memory. An approach is presented for evaluating different data reuse options in terms of the memory space required by buffering reused data and execution time for loading the data to the local memories. Determining the data reuse design option that consumes the least power or performs operations quickest with respect to a memory constraint is a NP-hard problem. In this work, the problem of data reuse exploration for low-power designs is formulated as a Multiple-Choice Knapsack problem. Together with a proposed power model, the problem is solved efficiently. An integer geometric programming framework is presented for exploring data reuse and loop-level parallelization within a single step. The objective is to find the design that achieves the shortest execution time for an application. We describe our approaches based on formal optimization techniques, and present some results from applying these approaches to several benchmarks that show the advantages of optimizing data memory organisation and of exposing the interaction between data memory system design and parallelism extraction to the compiler.
منابع مشابه
ILDJIT: a parallel dynamic compiler
Multi-core technology is being employed in most recent high-performance architectures. Such architectures need specifically designed multi-threaded software to exploit all the potentialities of their hardware parallelism. At the same time, object code virtualization technologies are achieving a growing popularity, as they allow higher levels of software portability and reuse. Thus, a virtual ex...
متن کاملSupporting Higher-Order Virtualization
Virtualization is ubiquitous, with the global availability of the Java Virtual Machine and other similar virtual machine platforms. Higher-order virtualization involves building a stack of virtual machine layers. This provides obvious advantages such as: flexibility; separation of concerns; reuse of existing functionality; support for legacy platforms. However, the benefits of higher-order virt...
متن کاملAutomatic Analysis of Loops to
Conngurable Arithmetic Logic Units (ALUs) ooer opportunities for adapting the underlying hardware to the computation for ee-ciency. The problem of identifying the optimal conngurations at diierent steps in a program is a very complex issue but allows the power of these ALUs to be maximally used if solved. This paper focuses on developing an automatic compilation framework for exploiting operato...
متن کاملExploiting Speculative Value Reuse Using Value Prediction
Data dependencies between instructions greatly impede instruction-level parallelism. Recently two hardware techniques – Value Prediction and Value Reuse – have been proposed to overcome the limits imposed by data dependencies. We introduce a new hardware scheme for exploiting speculative value reuse by using value prediction. We propose a new microarchitecture which uses value prediction to pro...
متن کاملParallelization in Co-Compilation for Configurable Accelerators - A Host / Accelerator Partitioning Compilation Method
Fig. 1: Makimoto’s wave: summarizing the history of paradigm shifts in semiconductor markets. the future? Abstract— The paper introduces a novel co-compiler and its “vertical” parallelization method, including a general model for co-operating host/accelerator platforms and a new parallelizing compilation technique derived from it. Small examples are used for illustration. It explains the exploi...
متن کامل